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Abstract 


In educational tests, subscores are often generated from a portion of the items in a larger 
test. Guidelines based on mean-squared error are proposed to indicate whether subscores 
are worth reporting. Alternatives considered are direct reports of subscores, estimates of 
subscores based on total score, combined estimates based on subscores and total scores, and 
residual analysis of subscore. Applications are made to data from two testing programs. 
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A basic criterion for reporting of a subscore based on a portion of the items in a larger 
test should be whether the subscore provides a more accurate measure of the construct 
it measures than is provided by the total score from the larger test. This standard for 
subscore reporting is readily handled by using classical test theory. Arguments are based 
on least squares and mean-squared error. Section 1 provides the basic theory required for 
the analysis. Section 2 considers some examples from testing programs at ETS. Section 3 
provides some conclusions from results of analysis. 

1 Mean-Squared Errors for True Subscores 

A true subscore can be estimated by use of the observed subscore, by use of the 
observed total score, or by a combination of the observed subscore and the observed total 
score. It is also possible to estimate the residual true subscore from regression of the true 
subscore on the true total score. 

Classical Test Theory Background 

To study these estimations, it is helpful to introduce some elementary classical test 
theory. Let A be an observed subscore, let At be the corresponding true score, and let Ae 
be the error A — At of measurement. Let E represent a mean, a 2 represent a variance, and 
a represent a standard deviation, so that E(A) is the mean of A, a 2 (A ) is the variance of 
A, and a (A) is the standard deviation of A. Linder classical test theory (Lord & Novick, 
1968; Holland & Hoskens, 2003), At and Ae are uncorrelated and E(Ae) = 0, so that 
E(A t ) = E(A) and 

o 2 (A) = o 2 (A T ) + a 2 (A E ). 

Similarly, let B be an observed total score, let Bt be the corresponding true score, and let 
B e = B — B t be the error of measurement, so that E(B e ) = 0, E(B ) = E(B T ), B t and 
Be are uncorrelated, E(Be) = 0, and 

o 2 (B) = o 2 (B T ) + a 2 (B E ). 

It is also the case that the measurement error Be is uncorrelated with the true score At 
and the measurement error Ae is uncorrelated with the true score Bt- Let a 2 (A) and a 2 {B) 
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be assumed to be positive. Let Cov denote a covariance, and let p denote a correlation, so 
that Cov(A, At) is the covariance cr 2 (Ar) of A and At, and 


p(A, A t ) 


Cov(A, At) 
ct(A)ct(At) 


c(At) 


( 1 ) 


is the correlation of the observed score A and the true score At- Let p 2 denote a squared 
correlation, so that 


p 2 (A, A t ) 


v 2 (A t ) 

a*(A) 


is the reliability coefficient of A. Similarly, 


p 2 (B,B T ) 


v 2 (B t ) 

a 2 (B) 


is the reliability coefficient of B. Analysis of the subscore A and the total score B is affected 
by the covariances 

Co v(A,B) = a(A)cr(B)p(A, B) 


and 


Cov(A t , Bt) — o-(At)o-(B t )p(At, B t ) — (j(A)(j( y B)p(A, At)p(B, Bt)p(At, Bt)- ( 2 ) 

The covariance and correlation of the true scores At and Bt may be determined by use 
of measurement properties of the remainder test score C = B — A. Because the true scores 
At and B T are not correlated with the measurement errors Ae and Be, the true score of C 
is Ct = Bt — At, the error of measurement Ce = Be — Ae of C is uncorrelated with the 
true score Ct, E{Ct) = E(C), E(Ce) = 0, 

a 2 {C) = a 2 {C T ) + ( y{CE), 


and 


One has 


p 2 {C,C T ) = 


v 2 (C t ) 

u 2 (C) 


Cov (A, B) = 


u 2 (A) + u 2 (B) 


a 2 (C) 
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and 


Cov(j4j i , B e ) 


a 2 (At) + <7 2 (Bt) — ct 2 (Ct) 
2 


In practice, estimates of the means E(A) and E(B) and the standard deviations cr(A), 
<t(A e ), ct(-B), and a(B E ), and reliability coefficients p 2 (A,A T ) and p 2 (B,B T ) are readily 
obtained from reports on testing programs produced at ETS. Given these estimates, a 2 (A), 
cr 2 (B), ct 2 (At), a (At), g 2 (Bt), and ct(Bt) are readily estimated. For example, 


g(A t ) = a(A)p(A, A t ). 

Estimation of Co v(A,B) and Cov(A t, Bt) is slightly more complicated, for it is not 
necessarily the case that measurement properties of C are directly available from reported 
data. In typical cases, a test score B is divided into k > 2 components Ci, 1 < i < k, A is 
Ch for some h from 1 to k, and B = C^. Corresponding to each C,; is a true score Cti 

and an error C Et such that 

Ci = C-Ti + Cex, 

E(C E i) = 0, C Ei and C E j are uncorrelated for i ^ j. Data from standard summaries include 
estimates for p(Ci,Cj) and p(CTi,CTj) for i ^ j and for g(Cti), o-(C'j), and p 2 (Cj, C E i). 
One may exploit the relationships 

k 

Co v(A, B) = ^ <?(C h )a(Ci)p(C h , Q) 

i =1 

and 

k 

Co v(A t ,B t ) = &(C'Th)v(C ti) p(C'Th, C E j). 

i =1 

Naturally, if i = h, then 

a(C h )a(C i )p(C h ,C i )=a 2 (C i ) 

and 

Cr(CTh)c r (CTi)p(CTh,CTi) = G 2 (Cti). 

In the analysis in this paper, the reliability estimates produced by testing programs are 
taken as given. For the cases under study, basic computations involve the KR-20 approach 
(Kuder & Richardson, 1937; Dressel, 1940) and the Kristof approach (Kristof, 1974). 
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Direct Approximation 

Given the summary measures just described, it is a relatively straightforward matter 


to consider the approximation of the true subscore At- For a baseline to compare 
approximations, consider the trivial prediction of At by the constant E{A). The 
mean-squared error is then 

o\A T ) = E{[A t - E{A)} 2 ), (3) 

so that the root mean-squared error is a {At)- If At is approximated by the observed score 
A, then the mean-squared error is 

o 2 {A e ) = E{[A-A t } 2 ). (4) 

The root mean-squared error is 

a(A E ) = e(A){l-p 2 (A,A T )] 1/2 . (5) 

Alternatively, Kelley’s formula may be applied, so that At is approximated by 

K = E(A)+p 2 (A,A t )\A-E(A)}, (6) 

and the mean-squared error is 

<J 2 {K — A t ) = p 2 (A, A T )cr~{A E ) = [1 — p 2 {A, A t )\<j 2 {A t ) (7) 

(Kelley, 1947). The root mean-squared error is 

a{K — A t ) = p{A, A t )<j{A e ) = p(A, A T )[ 1 — p"(A, A t )\ 1 ^<j{A). (8) 


The proportional reduction of mean-squared error from use of K rather than the constant 
predictor E{A) to predict At is 


a 2 {A t ) — <J 2 {K — A t ) 


p 2 {A, A t ). 


(9) 
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Regression Approximation 


Regression analysis may be employed to approximate the true subscore At by the 
observed total score B (Wainer et ah, 2001; Holland & Hoskens, 2003). The covariance 

Co v(At, B) = Cov(At,Bt), 
so that (1) and (2) imply that the prediction is 
L = E(A) + C 0 ^ B ) Br) [B ~ E(B)] = E{A '> + P (B - B t)p( a t, Bt)^ ^[B - E(B )], (10) 
the mean-squared error is 

o~(L — At) — ct 2 (At) — [Co v(At, Bt)} 2 /<j 2 (B) = [1 — p 2 {B, Bt)p~(At, Bt)]ct 2 (At), (11) 


and the root mean-squared error is 

<j(L — At) = [1 — p~{B, Bt)p 2 (At, Bt)) 1 A(t{At). (12) 


The proportional reduction in mean-squared error from use of L rather than E(A) to 
predict A T is 


P 2 (A T , B) 


o 2 (A t ) — [1 — P 2 (B, Bt)p 2 (At, Bt)]ct 2 (At) 
cr 2 (A T ) 


p 2 (B, B T )p~(A T , B t ). 


(13) 


If a(L — At) is less than a{K — At), then use of the subscore A by itself is very difficult 
to justify for estimation of the true score At, for the true score At in this instance is 
better approximated by use of the regression based on the observed total score B than 
by use of the estimate derived from Kelley’s formula from the observed subscore A. The 
condition that cr(L — At) is less than cr(K — At) is equivalent to the condition that 
p(B, Bt)p(At, Bt) exceeds p(A, At). Thus use of the total score rather than the subscore 
is increasingly favored as the reliability of the total score increases, the correlation of true 
subscore and true total score increases, and the reliability of the subscore decreases. 

One may also consider joint use of the observed subscore A and the total score B in 
approximation of At- Use of A and B together is equivalent to use of A and the remainder 
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score C = A — B, although some changes in formulas are required. The best linear predictor 
of At based on A and B is 

M = E(A) + p[A - E(A )] + 7 [5 - E(B)\, (14) 

where f3 and 7 satisfy the normal equations 

(5a 2 {A) + 7 Cov(A, B ) = Cov(A T , A) = a 2 (A T ) 


and 

P Cov(A, B) + 7 cr 2 (i?) = Cov(Ar, B) = Cov(A T , B T ) 

With a bit of algebra, one finds that 

o-(A) 

7 = -^b~p( A ’ A t ) t > 

where 

p(B, B T )p(A T , B t ) — p(A, B)p(A , A r ) 

1 -P 2 (A,B) ’ 

and 

P = p{A, A T )\p{A, A t ) -p(A,B)r}. 

The mean-squared error 

< 7 2 ( k M — At) = p 2 (A, At )[1 — p“(A, At) — Tpu 2 (A). (18) 

The proportional reduction in mean-squared error from use of M rather than E(A) to 
predict A T is then 

P 2 (At, M) = p 2 (A, At) + t 2 . ( 19 ) 

Obviously, a 2 (M — At) is no greater than the minimum v of o 2 {L — At) and a 2 (K — At). 
If a 2 (M — At) is substantially smaller than u, then M is worthy of consideration. 

All analysis may be reported in terms of A and the remainder score C. For example, 

M = E(A) + (0 + 7) [A - E(A)\ + 7 [C - E{C)\. 


(15) 

(16) 

(V) 


6 



Approximation of the True Residual 

It is also possible to examine the true residual 

D t = [A r - E(A)\ - C [B t - E(B)\. (20) 

By (2), the regression coefficient 

Cov(A r , -Bt) p(At, Bt)ct(At) 

cr 2 (Bt) <j(Bt) 

This residual is the difference between the true subscore At and its best linear predictor 
based on the true total score Bt- Thus Dt provides a measure of the information provided 
by the true subscore that is not provided by the true total score. A positive value of D T 
would indicate that expected performance on the subscore is better than expected from the 
total score, while a negative value of Dt suggests a weaker performance on the subscore 
than predicted by the total score. 

The trivial approximation of Dt is the constant predictor 0 that corresponds to a true 
subscore that is a linear function of the true total score. The mean-squared error is then 

<j 2 (Dt) = [1 — p 2 (At, Bt)}ct 2 (At), ( 21 ) 

so that the root mean-squared error is 

ct(Dt) = [1 — P 2 (At, Bt)] 1 ^ 2 ci{At). ( 22 ) 

Note that ct(Dt) < cr(L — At)- 

An alternative approximation is 

D = [A-E(A)]-C[B-E(B)]. (23) 


In this case, 


De — D — Dt — Ae — (Be, (24) 

so that E(D) = E(D t ) = E(D e ) = 0 , D T and D E are uncorrelated, and the mean-squared 
error of D is 


o 2 (D E ) = v 2 (A e ) - 2C Co v(A e , B e ) + CV(B E ). (25) 
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To evaluate the mean-squared error, note that 


Cov(Ae, Be) = Cov(A, B ) — Cov(At, Bt )• 


Kelley’s formula can be applied here as well. If 


then 


In (27), 


p 2 (D,D T ) = 


F = p 2 {D, D t )D, 

a(F — Dt) = p(-D, Dt)o~(De)- 
a 2 (D T ) a 2 (D T ) 


( 26 ) 


(27) 


cr 2 (D) a 2 {D T ) + a 2 {D E ) 
Note that p 2 (D,Dt) is the reliability of D. 


2 Examples 

To illustrate application of subscores, consider data from the October 2002, 
administration of the SAT® I examination (Feigenbaum & Hammond, 2003). Results are 
summarized in Tables 1, 2, 3, and 4. In these tables, Verbal I, Verbal II, and Verbal III refer 
to the three separate portions of the SAT verbal examination, which are interleaved with 
Math I, Math II, and Math III, the three separate portions of the SAT math examination. 
An alternative breakdown of the SAT verbal uses critical reading (OR), analogies (A), 
and sentence completion (SC). Similarly, an alternate decomposition of SAT math uses 
four-choice math multiple choice (Math 4c), five-choice multiple choice (Math 5c), and 
student-produced math responses (Math S). To examine these tables, recall formulas (5), 
(8), (9), (12), (13), (14), (17), (15), (19), (22), (25), and (27). In Table 2, proportional 
reduction in mean-squared error is relative to use of a constant predictor equal to the 
expected subscore. In Table 4, the proportional reduction calculation is relative to use of 
the constant 0. 

In these tables, for any given line, the subscore is A and the total score is B. For 
example, let A be the subscore of the first verbal section (10 sentence completions, 13 
analogies, and a 13-item reading passage), and let B be the total score for the 78-item 



verbal test. Then a(A E ) is estimated to be 2.9, while a(K — At) is estimated to be 2.7. 

In this case, the subscore is clearly unsatisfactory relative to the approximation L based 
on the total score, for a(L — At) is estimated to be 2.0, a somewhat smaller figure than 
is available from A itself. Use of M yields only a slight reduction in root mean-squared 
error, for < x(M — At) is also 2.0 if two significant figures are used. The weight (3 assigned to 
the subscore A is only 0.13. Both L and M are quite respectable estimates for At, for the 
proportional reductions in mean-squared error are both 0.91 to two significant figures. In 
the case of the residual estimates, cr(D T ) is estimated to be 0.8, cr(D E ) has estimate 2.2, 
and a(F — Dt) has estimate 0.7, so that there is little gain from use of F or D instead of 
the estimate 0 for Dt- Note that the proportional reduction in mean-squared error from 
use of F rather than 0 is only 0.11. 

Similar results apply to the other sections of the verbal examination, and similar results 
also apply if A is a section of and B is the total score for the math examination. The 
variations in Table 1 in the coefficient 7 mostly just reflects relative lengths of sections. 

In summary, none of the reported subscores of SAT I math or SAT I verbal provides any 
appreciable information concerning an examinee that is not already provided by the total 
score. 

On the other hand, the analysis here would certainly support use of separate math and 
verbal scores. Let A be the math total, and let B be the sum of the math and verbal total. 
In this case, a(A E ) = 3.7, cr(K — At) = 3.6, and a(L — At) = 5.4, so that the math true 
score is much less well-predicted by the combined total score than by the the math score. 
Similarly, for the verbal score, cr(A E ) = 4.6, a(K — At) = 4.4, and cr(L — At) = 5.7. There 
is little value in use of the joint predictor M. Here a(M — At) is 3.4 for math and 4.2 for 
verbal. 

In the case of residual analysis, for math, ct(Dt) is 4.7, ct(De) is 2.9, and a(F — Dt) is 
2.4. Because the total score is the sum of the math and verbal subscores, the same results 
apply for the verbal test. The estimated proportional reduction in mean-squared error from 
use of F rather than 0 is the estimated reliability coefficient 0.73 in both cases. Thus use 
of F does provide a substantial gain over the trivial estimate of 0. The root mean-squared 
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Table 1. 

Root Mean-Squared Errors for True Score Estimation for SAT Subscores 


Subscore 

Items in 
subscore 

Total 

score 

&(A E ) 

a(K - A T ) 

a(L — A-t) 

— -^4t) 

P 

7 

Verbal I 

36 

Verbal 

2.9 

2.7 

2.0 

2.0 

0.13 

0.35 

Verbal II 

30 

Verbal 

2.8 

2.5 

1.6 

1.6 

-0.02 

0.35 

Verbal III 

12 

Verbal 

1.8 

1.5 

1.1 

1.0 

0.19 

0.13 

CR 

40 

Verbal 

3.4 

3.2 

2.6 

2.5 

0.16 

0.39 

A 

19 

Verbal 

2.1 

1.8 

1.3 

1.2 

0.17 

0.17 

SC 

19 

Verbal 

2.1 

1.8 

1.3 

1.2 

0.20 

0.18 

Math I 

25 

Math 

2.3 

2.1 

1.7 

1.6 

0.16 

0.35 

Math II 

25 

Math 

2.3 

2.1 

1.5 

1.5 

0.07 

0.33 

Math III 

10 

Math 

1.5 

1.2 

0.7 

0.7 

0.06 

0.13 

Math 4c 

15 

Math 

1.9 

1.6 

0.9 

0.9 

0.03 

0.21 

Math 5c 

35 

Math 

2.7 

2.6 

2.1 

2.1 

0.10 

0.50 

Math S 

10 

Math 

1.2 

1.1 

0.7 

0.7 

0.16 

0.12 

Verbal 

78 

Total 

4.6 

4.4 

5.7 

4.2 

0.70 

0.13 

Math 

60 

Total 

3.7 

3.6 

5.4 

3.4 

0.76 

0.09 


Table 2. 

Proportional Reduction of Mean-Squared Error Achieved by True Score 

Estimation for SAT Subscores 


Subscore 

Total score 

K 

L 

M 

Verbal I 

Verbal 

0.84 

0.91 

0.91 

Verbal II 

Verbal 

0.80 

0.92 

0.92 

Verbal III 

Verbal 

0.72 

0.86 

0.87 

CR 

Verbal 

0.84 

0.89 

0.90 

A 

Verbal 

0.74 

0.87 

0.88 

SC 

Verbal 

0.78 

0.88 

0.89 

Math I 

Math 

0.87 

0.92 

0.92 

Math II 

Math 

0.83 

0.91 

0.91 

Math III 

Math 

0.64 

0.89 

0.89 

Math 4c 

Math 

0.72 

0.91 

0.91 

Math 5c 

Math 

0.89 

0.93 

0.93 

Math S 

Math 

0.73 

0.89 

0.90 

Verbal 

Total 

0.91 

0.85 

0.92 

Math 

Total 

0.92 

0.82 

0.93 
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Table 3. 

Root Mean-Squared Error for Residual Estimation for SAT Subscores 


Subscore 

Total score 

ct(Dt) 

<r{D E ) 

a(F — D t ) 

Verbal I 

Verbal 

0.8 

2.2 

0.7 

Verbal II 

Verbal 

0.3 

2.2 

0.3 

Verbal III 

Verbal 

0.8 

1.7 

0.7 

CR 

Verbal 

1.3 

2.3 

1.2 

A 

Verbal 

0.8 

1.9 

0.7 

SC 

Verbal 

0.8 

1.8 

0.7 

Math I 

Math 

0.5 

1.8 

0.5 

Math II 

Math 

0.6 

1.8 

0.6 

Math III 

Math 

0.4 

1.4 

0.4 

Math 4c 

Math 

0.5 

1.6 

0.5 

Math 5c 

Math 

0.5 

1.8 

0.5 

Math S 

Math 

0.4 

1.2 

0.4 

Verbal 

Total 

4.7 

2.9 

2.4 

Math 

Total 

4.7 

2.9 

2.4 


Table 4. 

Proportional Reduction of Mean-Squared Error Achieved by Residual 

Estimation for SAT Subscores 


Sub score 

Total score 

F 

Verbal I 

Verbal 

0.11 

Verbal II 

Verbal 

0.02 

Verbal III 

Verbal 

0.18 

CR 

Verbal 

0.24 

A 

Verbal 

0.16 

SC 

Verbal 

0.15 

Math I 

Math 

0.09 

Math II 

Math 

0.11 

Math III 

Math 

0.08 

Math 4c 

Math 

0.08 

Math 5c 

Math 

0.07 

Math S 

Math 

0.12 

Verbal 

Total 

0.73 

Math 

Total 

0.73 


11 



error from use of F is about half the corresponding root mean-squared error from use of 0. 
On the other hand, the proportional reduction of mean-squared error of 0.73 from use of 
F to assess deviation of SAT 1 math from the value expected by SAT 1 total is somewhat 
smaller than the proportional reduction in mean-squared error of 0.92 associated with use 
of K for estimation of SAT 1 math. 

One may argue that it is unreasonable to expect very much information from 
subscores in the SAT 1 math and verbal examinations. The SAT 1 math and SAT 1 verbal 

TM 

examinations measure relatively limited content areas. On the other hand, some Praxis 
examinations contain parts that test very distinct content areas. For instance, consider the 
test titled Fundamental Subjects: Content Knowledge with code 0511 (Grant, 2003). This 
test measures English language arts (E), mathematics (M), citizenship and social science 
(C), and science (S). Each area is measured with 25 multiple-choice items, and the total 
raw score is the sum of the scores for each area. Results are summarized in Tables 5, 6, 7, 
and 8. 

Here the direct estimate K of true subscore is roughly comparable to the estimate L 
of true subscore derived from the total score. Use of M provides a modest but appreciable 
improvement in all cases. Results are best for mathematics, and in all cases a relatively 
substantial weight is given to the direct estimate. With M, proportional reductions of 
mean-squared error are around 0.8, so that estimation of the true subscores by use of M can 
be regarded as relatively successful; however, the proportional reductions in error achieved 
from M are somewhat smaller than those achieved with M for subscores of SAT 1 math 
or verbal. The essential issue would appear to be that the subscores are less accurately 
predicted by total score in the Praxis case. 

For residual analysis, appreciable gains over estimation of Dt by 0 are only seen for 
English language arts and mathematics, and even here the gains require use of F. The 
proportional reductions in mean-squared error reported in Table 8 are all relatively modest. 
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Table 5. 

Root Mean-Squared Error for True Score Estimation for Praxis Subscores 


Subscore 

< 7(^4 E ) 

a(K - A t ) 

<7 (Z/ — A t ) 

<7 (Ibf — A t ) 

P 

7 

E 

1.7 

1.5 

1.5 

1.3 

0.44 

0.10 

M 

1.9 

1.7 

1.9 

1.5 

0.51 

0.12 

C 

1.8 

1.5 

1.3 

1.2 

0.60 

0.03 

s 

2.0 

1.7 

1.3 

1.3 

0.57 

0.05 


Table 6. 

Proportional Reduction of Mean-Squared Error Achieved by True Score 

Estimation for Praxis Subscores 


Subscore 

K 

L 

M 

E 

0.73 

0.70 

0.80 

M 

0.79 

0.73 

0.83 

C 

0.68 

0.77 

0.81 

s 

0.69 

0.80 

0.82 


Table 7. 

Root Mean-Squared Error for Residual Estimation for Praxis Sub scores 


Subscore 

ct(Dt) 

<j(d e ) 

a(F — Dt) 

E 

1.3 

1.5 

1.0 

M 

1.6 

1.6 

1.1 

C 

1.0 

1.6 

0.8 

s 

1.0 

1.7 

0.9 


Table 8. 

Proportional Reduction of Mean-Squared Error Achieved by Residual 

Estimation for Praxis Subscores 


Subscore 

F 

E 

0.43 

M 

0.48 

C 

0.29 

s 

0.25 
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3 Conclusions 


The methods of subscore analysis proposed are very easily implemented and provide 
a rational criterion for assessing the value of subscores. Results suggest that a good deal 
of caution is needed. Subscores are most likely to have value if they have relatively high 
reliability by themselves and if the true subscore and true total score have only a moderate 
correlation. Both conditions are important. The SAT subscores are relatively unsuccessful 
due to the very high correlations of their true scores with the true total score; however, 
many of the subscores are rather reliable. Appropriate approximations of the true subscore 
give very high weight to the total score. The Praxis subscores are often less reliable than 
are many of the SAT subscores, but the correlation of true subscores to true total score 
is somewhat more modest than for the SAT subscores. Nonetheless, even for the Praxis 
subscores, which are all based on 25 items and measure very different content areas, the 
subscores are best used when combined with the total score, and the reliability of the 
resulting combination M is somewhat less than for the total score. Although the results 
here do not prove that subscores cannot be useful, they do suggest that claims for the 
value of subscores should be treated skeptically and should be verified by use of procedures 
similar to those in this report. 

This report emphasizes simple approaches to subscores. It is possible that alternatives 
can be constructed that are quite attractive in particular applications. For example, 
subscore predictions from total scores may be based on use of log-linear models or use of 
item-response theory. Thus additional work can be considered to aid in subscore assessment. 


14 



References 


Dressel, P. L. (1940). Some remarks on the Kuder-Richardson 20 (KR-20) reliability statistic 
for formula scored tests. Psychometrika, 5, 305-310. 

Feigenbaum, M., & Hammond, S. (2003). Test analysis, College Board, SAT® I: Rea¬ 
soning Test, Fall 2002 administrations, 3YSA03-3YSA05 (Report No. SR-2003-37). 
Princeton, NJ: ETS. 

Grant, M. (2003). Fundamental subjects: Content knowledge (0511), test analysis, form 
3ypxl (Report No. SR-2003-62). Princeton, NJ: ETS. 

Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response 
theory: Application to true-score prediction from a possibly nonparallel test. Psy¬ 
chometrika, 68, 123-149. 

Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press. 
Kristof, W. (1974). Estimation of reliability and true score variance from a split of a test 
into three arbitrary parts. Psychometrika, 39, 491-499. 

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. 
Psychometrika, 2, 151-160. 

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, 
MA: Addison-Wesley. 

Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Swygert, K. A., & Thissen, D. (2001). 
Augmented scores—“Borrowing strength” to compute scores based on small numbers 
of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343-387). Mahwah, NJ: 
Erlbaum. 


15 



